Evaluating fatigue in patients recovering from COVID-19: validation of the fatigue severity scale and single item screening questions

Background Fatigue is a common symptom in hospitalized and non-hospitalized patients recovering from COVID-19, but no fatigue measurement scales or questions have been validated in these populations. The objective of this study was to perform validity assessments of the fatigue severity scale (FSS) and two single-item screening questions (SISQs) for fatigue in patients recovering from COVID-19. Methods We examined patients ≥ 28 days after their first SARS-CoV-2 infection who were hospitalized for their acute illness, as well as non-hospitalized patients referred for persistent symptoms. Patients completed questionnaires through 1 of 4 Post COVID-19 Recovery Clinics in British Columbia, Canada. Construct validity was assessed by comparing FSS scores to quality of life and depression measures. Two SISQs were evaluated based on the ability to classify fatigue (FSS score ≥ 4). Results Questionnaires were returned in 548 hospitalized and 546 non-hospitalized patients, with scores computable in 96.4% and 98.2% of patients respectively. Cronbach’s alpha was 0.96 in both groups. The mean ± SD FSS score was 4.4 ± 1.8 in the hospitalized and 5.2 ± 1.6 in the non-hospitalized group, with 62.5% hospitalized and 78.9% non-hospitalized patients classified as fatigued. Ceiling effects were 7.6% in the hospitalized and 16.1% in non-hospitalized patients. FSS scores negatively correlated with EQ-5D scores in both groups (Spearman’s rho − 0.6 in both hospitalized and non-hospitalized; p < 0.001) and were higher among patients with a positive PHQ-2 depression screen (5.4 vs. 4.0 in hospitalized and 5.9 vs. 4.9 in non-hospitalized; p < 0.001). An SISQ asking whether there was “fatigue present” had a sensitivity of 70.6% in hospitalized and 83.2% in non-hospitalized patients; the “always feeling tired” SISQ, had a sensitivity of 70.5% and 89.6% respectively. Conclusions Fatigue was common and severe in patients referred for post COVID-19 assessment. Overall, the FSS is suitable for measuring fatigue in these patients, as there was excellent data quality, strong internal consistency, and construct validity. However, ceiling effects may be a limitation in the non-hospitalized group. SISQs had good sensitivity for identifying clinically relevant fatigue in non-hospitalized patients but only moderate sensitivity in the hospitalized group, indicating that there were more false negatives. Supplementary Information The online version contains supplementary material available at 10.1186/s12955-022-02082-x.

COVID-19, regardless of whether they were hospitalized for their initial illness [1]. In many studies, fatigue is the most reported symptom [1,2,3,4]. However, reports regarding the proportion of patients that endorse fatigue has varied greatly, from as low as 1.8% in one study to as high as 98% in another [1,2,3,4]. This variability could be explained in part by differences in the study populations and sampling biases, but also the means in which fatigue has been assessed [2]. In many of the observational studies thus far, fatigue was evaluated via interview or questionnaire as a single item as part of a symptom inventory [4,5,6,7,8,9,10]. For example, in a highly-cited internet-based survey, respondents were asked to indicate the presence or absence of fatigue among a list of over 200 symptoms [7] Other studies have used several different standardized instruments including the fatigue severity scale (FSS) [11,12,13,14] chandler fatigue scale (CFQ) [15,16], modified fatigue impact scale (MFIS) [17,18] and the patient-reported outcomes measurement information system (PROMIS) global health instrument [19]. The heterogeneity of fatigue measurement in the literature has made it difficult to perform cross-study comparisons regarding the prevalence of fatigue and appreciate the severity of the fatigue reported [2]. It is also unclear how well the single item screening questions (SISQs) used in the symptom inventories can identify fatigue in relation to the more detailed instruments.
To better characterize post-COVID-19 fatigue and assess the efficacy of interventions, further research studies worldwide would benefit from using survey instruments that have been specifically validated in this population. Although several fatigue scales exist, none have been validated in either a previously hospitalized or non-hospitalized post-COVID-19 cohort.
Our current study focused on validation of the fatigue severity scale (FSS); a self-reported questionnaire designed to assess fatigue severity based on its impact on a patient's functioning [20]. The FSS is one of the most used measures of fatigue and has been validated in several health conditions [21]. An evaluation of the psychometric properties of the FSS specifically in post-COVID-19 patients would help provide researchers with an understanding of the strengths and weaknesses of the instrument when designing future studies and interpreting their results.
We sought to use data collected through the Post COVID-19 Recovery Clinics (PCRCs) in British Columbia (BC) to investigate the performance of the FSS in patients that were hospitalized for COVID-19 and in patients who were not hospitalized, but referred for persistent symptoms. As such, this cross-sectional study had two main objectives. The first aim was to assess the psychometric properties of the FSS in these patient groups, including data quality, internal consistency, and construct validity. The second aim was to determine how effective two different SISQs were at identifying fatigued patients using an FSS cut-off as the reference standard.

Participants and data collection
The study was conducted through the Post-COVID-19 Interdisciplinary Clinical Care Network (PC-ICCN) which was designed as a learning health system to facilitate both clinical care and research throughout BC, a Canadian province of approximately 5 million people [22,23]. At the time of this study, the PC-ICCN comprised of 4 PCRCs which were physically located in the outpatient departments at St. Paul's Hospital, Vancouver General Hospital, Surrey Memorial Hospital and Abbotsford Regional Hospital. At the clinics, patients are assessed by internal medicine physicians in-person or by telehealth.
Patients were eligible to be referred by a clinician to the PC-ICCN if they were adults, and were either hospitalized for acute COVID-19 or were not hospitalized but were experiencing persistent symptoms following their initial infection. The program accepted referrals from the entire province. Information regarding whether the patient was admitted to hospital or intensive care unit (ICU) was indicated by the referring practitioner.
Prior SARS-CoV-2 infection is confirmed for each patient is confirmed based on the presence of a positive nasopharyngeal polymerase chain reaction (PCR) swab and/or positive serology (if this was tested prior to vaccination). Patients were emailed a baseline questionnaire as a PDF file to complete independently prior to their first assessment. Patients had the option to either answer questions electronically or complete them on paper. The baseline questionnaire elicits information about employment status, ethnicity, date of COVID-19 symptom onset, current symptoms, and contains standardized patient-reported outcome measures.
In this cross-sectional study, we included consecutive adult patients who tested positive for COVID-19 between March 1, 2020 and July 17, 2021, and completed their baseline questionnaire at least 28 days after testing positive. These dates encompassed the first three waves of COVID-19 in BC, and during this period, there were 149,308 total cases reported in the province, of which 8117 (5.4%) were hospitalized and 1847 (1.2%) required ICU [24]. Patients were excluded if there was missing information about the date of the confirmed positive COVID-19 test or if their COVID-19 hospitalization history was not known. We analyzed the previously hospitalized and non-hospitalized patients in parallel as two independent cohorts given their different referral criteria.

Outcome measures
The FSS is a self-administered instrument which takes about 8 min to complete [25]. It includes 9 items, each consisting of a statement for which respondents are asked to indicate their level of agreement from 1 (strongly disagree) to 7 (strongly agree) [20]. Higher scores for each item indicate greater fatigue severity. We scored the FSS by calculating the mean score of the nine items [26]. We computed a score if ≥ 8 items were completed, which is considered acceptable given that FSS items are unidimensional and strongly correlated with each other [27]. An FSS score ≥ 4 indicates clinically important fatigue [21,28]. The EQ-5D-5L measures health-related quality of life (HRQOL) based on five items that each represent a domain (Mobility, Self Care, Usual Activities, Pain/Discomfort, and Anxiety/ Depression) [29]. Patients rate their health status on a five-point scale for each domain (no problems, slight problems, moderate problems, severe problems or extreme problems), which represents a "health state" that supports calculation of a health utility score using a value set algorithm derived from the preferences of a particular population. In this study, we derived health utilities from patient responses using a Canadian value set where the scores can range from -0.148 for the worst health state to 0.949 for the best [30]. The EQ-5D-5L also consists of a visual analogue scale (VAS) in which patients are asked to indicate their health that day from 0 (worst health imaginable) to 100 (best health imaginable) [29].
The PHQ-2 is a widely used screening instrument for depression that consists of two items that ask about depressed mood and anhedonia [31]. Patients are asked to indicate the frequency of each symptom over the past 2 weeks, from 0 (not at all) to 3 (nearly everyday). The maximum total score is 6 and a score ≥ 3 is considered a positive screen, with 92% specificity for detecting major depression [32].
The questionnaire also contained two SISQs that screened for fatigue. First, "fatigue" was listed as part of a symptom inventory in which respondents indicated with a check box whether the symptom was currently present. In a subsequent section titled "medical status", respondents were asked to indicate "yes" vs. "no" on whether they had particular conditions or problems from a list, and one of these items was "always feeling tired".

Statistical analyses
The methods used to evaluate the psychometric properties of the FSS in this study have been applied previously in the context of other disease groups and healthy populations [25,28,33,34,35]. The hospitalized and non-hospitalized patient cohorts were assessed in parallel using the same methods. The analyses were conducted using Microsoft Excel and IBM SPSS.

Data quality and distributions
Data quality was assessed by calculating the proportion of FSS questionnaires that had missing scores and the proportion for which mean FSS scores could be computed. We examined the distribution of scores for by calculating the mean, standard deviation and skewness for individual items and overall FSS score, and by assessing for ceiling effects and floor effects. Typically, ceiling or floor effects are considered present if greater than > 15% of respondents have the minimum (FSS of 1) or maximum scores (FSS of 7) respectively [36].

Internal consistency
The internal consistency of the FSS was assessed by measuring the correlation between each item and between each item and the overall FSS score. The item-FSS correlation score was corrected for overlap by comparing the correlation between the item and the mean of all other items on the FSS. A separate Cronbach's alpha statistic was calculated for each patient cohort, with additional calculations leaving out each individual FSS item. A Cronbach's alpha of > 0.9 is considered appropriate internal consistency [37].

Construct validity
Construct validity is assessed by comparing the measure of interest to other constructs that are known to be positively associated (i.e. convergent validity) and to constructs that are known to be unrelated or negatively associated (i.e. divergent validity) [38,39]. For other disease populations, the construct validity of the FSS has been assessed through its comparison with patient-reported symptoms of depression and EQ-5D scores [33,34]. One study also demonstrated that among the EQ-5D dimensions, FSS had the strongest negative correlation with the "usual activities" dimension [40]. In this study, we hypothesized that patients who screened positive for depression on the PHQ-2 would have higher FSS scores, that there would be inverse relationships between FSS and HRQOL, and between FSS and ability to perform usual activities. Spearman correlation was used to quantify associations between FSS and EQ-5D-5L health utility and FSS and EQ-5D VAS score. The Mann-Whitney U test and Kruskal-Wallis test were used to determine  20:170 the between group differences between FSS scores based on PHQ-2 and the EQ-5D-5L usual activities dimension respectively.

Evaluation of single item screening questions
We examined how responses to these SISQs were able to classify fatigue in comparison to the FSS based on the FSS cut-off score of ≥ 4. This was done by calculating the sensitivity, specificity, positive likelihood ratio (PLR), negative likelihood ratio (NLR), positive predictive value (PPV), and negative predictive value (NPV). Among patients who answered both SISQs on the questionnaire, we assessed the degree of agreement between these questions using the Cohen's Kappa statistic. A Kappa statistic of 0.41-0.60 is considered moderate, 0.61-0.80 substantial, and 0.81-1.0 almost perfect agreement [41].

Internal consistency
In the hospitalized and non-hospitalized patients, the Cronbach's alpha for the FSS was 0.96 in both groups. The individual item correlation and Cronbach's alpha analyses for the hospitalized and non-hospitalized is shown in Additional file 1: Table S1.

Construct validity
Construct validity of the FSS was very good in both groups. FSS scores showed a moderate negative correlation to HRQOL as measured by the EQ-5D-5L  (Table 3). In the hospitalized patients, spearman correlations between FSS the EQ-5D-5L VAS scores were − 0.5 (p < 0.001) and between the FSS and EQ-5D-5L HU scores was − 0.6 (p < 0.001). In the non-hospitalized group, these correlations were − 0.5 (p < 0.001) and − 0.6 (p < 0.001) respectively. The Usual Activities dimension of the EQ-5D-5L was used to compare FSS with level of functioning (Table 4). In both the hospitalized and nonhospitalized groups, the mean FSS differed between levels of functioning, generally increasing with greater      dysfunction (p < 0.001 for both hospitalized and nonhospitalized groups). FSS was also higher among hospitalized patients who had a positive depression screen (PHQ-2 score ≥ 3). As shown in Table 4, among the hospitalized cohort, the mean FSS scores were 5.4 among those with positive screens vs. 4.0 (p < 0.001). Similarly, in the non-hospitalized patients, the mean FSS was 5.9 in patients who screened positive compared to 4.9 in those that did not (p < 0.001).

Evaluation of single item screening questions
A total of 330 (62.5%) hospitalized and 423 (78.9%) nonhospitalized patients seen in clinic were classified as fatigued according to the FSS cut-off score ≥ 4 (Additional file 1: Table S2). For the hospitalized and nonhospitalized cohorts respectively, the sensitivity of the "fatigue present" SISQ for classifying fatigue was 70.6% and 83.2%, whereas the specificity was 70.2% and 57.5%. The "always feeling tired" SISQ had a sensitivity of 70.5% and 89.6% and specificity of 76.4% and 58.7%. The positive and negative likelihood ratios were also calculated ( Table 5) as well as the positive and negative predictive values (Additional file 1: Table S3).
Inter-item agreement between the two SISQs was also assessed for each patient group (Additional file 1: Table S4). The kappa statistic was 0.4 for both the hospitalized and non-hospitalized groups, indicating moderate agreement.

Discussion
In this study, we performed validation assessments of the FSS instrument and two SISQs for fatigue in patients recovering from COVID-19. These assessments were completed as part of standardized evaluations within a learning health system clinical care model, and included outpatients from across BC who either were hospitalized for acute COVID-19, or were not hospitalized, but were referred for persistent symptoms. Based on current CDC and NICE criteria, the group of non-hospitalized patients would all be classified as having long COVID [42,43]. To our knowledge, this is the largest observational study in which the FSS was used in post-hospitalization COVID-19 and/or long COVID patients.
Our findings highlight that fatigue is both common and severe in those recovering from COVID-19. To provide a context regarding the degree of fatigue severity, we can make crude comparisons between the FSS scores and those reported in the literature for healthy populations and other disease groups. For example, the fatigue severity in the hospitalized patients (mean FSS score 4.2) was over one standard deviation above the mean from what has been reported previously for healthy individuals (FSS score 3.0) [28]. The mean FSS of 5.2 in non-hospitalized  patients was nearly two standard deviations above this standard, and is as high or nearly as high as what has been reported in the largest studies for conditions in which fatigue is a cardinal symptom, such as post-polio syndrome (FSS score 5.2) [44], chronic fatigue syndrome (CFS) (FSS score 6.0), fibromyalgia (FSS score 5.9) [45]. However, these relatively high FSS scores also manifested in distributions that were negatively skewed in both patient groups. This pattern was particularly pronounced in the non-hospitalized patients where we identified significant ceiling effects (16.2%), and this may represent a limitation for using the FSS in future clinical studies [36]. Specifically, it will be challenging to differentiate levels fatigue severity among patients who have maximum or near-maximum scores, and assess responsiveness to change [36]. Researchers should take this into account by considering non-parametric tests and data transformation. This negatively skewed data is also a recognized limitation of generic fatigue instruments in patients with CFS [46], but several of these instruments, including the FSS continue to be used in clinical trials [47,48].
Although the psychometric properties of the FSS have previously been assessed in multiple patient groups [26,33] it was important to assess this specifically in patients following COVID-19. Modern psychometric theory emphasizes that the performance of a particular survey instrument like the FSS is not a fixed property of the scale itself, but rather a function of the scale, the circumstances of administration and the specific group of respondents [38]. In this study, the FSS was acceptable in these patients as we were able to compute scores in over 96% of respondents. It was demonstrated that the FSS has strong internal consistency with high Cronbach's alpha and inter-item correlations. Furthermore, the FSS also demonstrated construct validity. Similar to what has been reported in other health conditions, there was a moderate negative correlation with EQ-5D health utility and VAS scores [27,40,49]. As expected, the FSS was higher in individuals with greater impairment of their usual activities and in those who screened positive for depression [25,27,40].
The performance of screening questions for fatigue had not been specifically investigated in COVID-19 patients prior to this study despite widespread use during the pandemic. Two SISQs were evaluated in this study in relation to the FSS. The first SISQ used the term "fatigue", and the second used the phrase "tired all the time". Our analyses revealed that in the in the nonhospitalized group, both SISQ had relatively high sensitivities (> 80%) for identifying fatigue (FSS score ≥ 4), but low specificity (< 60%). In contrast, the non-hospitalized patients had both moderate sensitivities and specificities (all 60-80%). This finding is important to highlight given that several highly cited studies in the post COVID-19 literature are based on hospitalized patients and examined fatigue using similar screening questions [7,8,9,10]. The higher false negative rate in this hospitalized group suggests that the prevalence of fatigue reported by these studies may in fact be underestimates. Ultimately, more comprehensive instruments such as the FSS are required to fully capture the number of patients who report fatigue.
Of the two SISQ questions, the one using the phrase "tired all the time" had slightly better performance, with sensitivity and positive likelihood ratios that were either higher or nearly identical to the SISQ that just used the term "fatigue". It is interesting that this SISQ question was more effectively able to identify patients with fatigue as defined by the FSS despite this screening question not using the term "fatigue". This may be an indication that longer more descriptive statements are more effective at capturing the presence of symptoms as opposed to single words (like "fatigue", "pain", "depression", etc.). Our finding that there was moderate but not strong agreement between responses to these SISQs is another indication that although they are similar, these two SISQs are not always interpreted identically and researchers must consider word choice carefully when developing symptom inventories in their questionnaires.
Our study had several elements that increase its generalizability and therefore applicability to future studies. Firstly, it included multiple centres and comprised of a diverse group of patients from throughout BC that completed the FSS assessments as part of clinical care. Secondly, the stringent referral criteria ensured that the analyses were limited to patients who were referred by a clinician and were confirmed to have had COVID-19. This contrasts with research in which participation is voluntary and those in which COVID-19 status is selfreported. These other study approaches likely suffer from greater collider bias [50] and are subject to the inclusion of patients who erroneously report their COVID-19 status [51]. Lastly, this work can be applied to groups of either hospitalized or non-hospitalized patients as these were analyzed separately for this study.
However, the study had limitations that must be acknowledged. Firstly, there are both referral and nonresponse biases that affected the composition of the study groups. Patients that were more symptomatic were likely overrepresented as they were more likely to have been referred by their physician, and this was especially true in the non-hospitalized group. Furthermore, the inclusion criteria permitted a wide range of follow-up time points relative to initial COVID-19 illness, and this is in part a reflection of the lack of precise definition for